SVM-based Automatic Annotation of Multiple Sequence Alignments

نویسنده

  • Jiansi Ren
چکیده

Multiple Sequence alignments are a critical step in phylogeny inference. There is a lack of an appropriate approach which is capable of 1) finding the best global alignment and 2) automating and reproducing manual editing. Progressive alignment is an effective method for multiple Sequence alignments. However, its application in practice has also long been largely hampered because the alignment regions are not homologous to maximize the alignment score. The standard practice in phylogenetics involves manual editing of alignments and manual editing is a non-trivial task. Aiming at these problems, this study 1) uses SVM to capture the neighborhood of a site to automate and reproduce manual editing, and 2) builds the procedure of SVM Model Training and Automatic Annotation. Experimental results demonstrate that a SVM-based classifier can reproduce the manual editing tasks with an accuracy of 95.5%. This method is stable to both RBF parameters (Gamma and C) and clearly outperforms GBLOCKS and AL2CO, which are conventional editing/annotating methods. The classification accuracy achieved by the proposed method is always much higher than those achieved by the counterpart methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reproducing the manual annotation of multiple sequence alignments using a SVM classifier

MOTIVATION Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually rem...

متن کامل

A CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images

Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...

متن کامل

Automatic assessment of alignment quality

Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant,...

متن کامل

Problems and pitfalls of automatic gene annotation, gene collection, domain prediction, and sequence alignment

Because of the following problems within the automatic gene annotation process it is absolutely necessary to manually check and annotate all genes. Almost every myosin gene prediction and its translation produced by the automatic processes contains errors derived from including intronic sequence and leaving out exons, as well as wrong predictions of start and termination sites. It is also absol...

متن کامل

Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data

UNLABELLED Recently, genome-wide surveys for non-coding RNAs have provided evidence for tens of thousands of previously undescribed evolutionary conserved RNAs with distinctive secondary structures. The annotation of these putative ncRNAs, however, remains a difficult problem. Here we describe an SVM-based approach that, in conjunction with a non-stringent filter for consensus secondary structu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JCP

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014